Model Report for `dataset_1`
generated on 26 Mar 2024, 23:46
|
Dataset
|
|
|
Accuracy
87.9%
(98.6%)
|
|
|
Distances
|
|
Correlations
Univariate Distributions
Bivariate Distributions
Accuracy
| Column | Univariate | Bivariate |
|---|---|---|
| PWF | 100.0% | 91.3% |
| RNF | 99.9% | 91.4% |
| HDF | 99.8% | 91.3% |
| OSF | 99.8% | 91.3% |
| TWF | 99.8% | 91.3% |
| Type | 99.6% | 90.9% |
| Machine failure | 99.4% | 91.0% |
| Tool wear [min] | 99.0% | 89.4% |
| UDI | 98.9% | 89.6% |
| Process temperature [K] | 98.5% | 89.3% |
| Air temperature [K] | 98.4% | 89.2% |
| Rotational speed [rpm] | 98.0% | 88.9% |
| Torque [Nm] | 97.5% | 88.6% |
| Product ID | 0.0% | 0.0% |
| Total | 92.0% | 83.8% |
Explainer
Accuracy of synthetic data is assessed by comparing the distributions of the synthetic (shown in green) and the original data (shown in gray).
For each distribution plot we sum up the deviations across all categories, to get the so-called total variation distance (TVD). The reported accuracy is then simply reported as 100% - TVD.
These accuracies are calculated for all univariate and bivariate distributions. A final accuracy score is then calculated as the average across all of these.
Distances
Identical Matches: 0.0% (0.0%)
Average Distances: 1.92 (1.87)
Explainer
Synthetic data shall be close, but not too close to the original data in order to preserve the confidentiality of the original samples.
This can be asserted by checking for exact matches between synthetic and original data, as well as by measuring distances between synthetic records to their closest original records.
These statistics are then compared against the observed statistics within the original data itself, and tested for statistical significance. In addition, their distributions are visualized above, with the distances for the synthetic data displayed in green, and the distances for the original data displayed in gray.
A green line that is significantly left of the gray line within the cumulative density plots implies that the generated data is too close to the actual records.